Goto

Collaborating Authors

 regularization matrix


High-order Regularization for Machine Learning and Learning-based Control

arXiv.org Artificial Intelligence

--The paper proposes a novel regularization procedure for machine learning. The proposed high-order regularization (HR) provides new insight into regularization, which is widely used to train a neural network that can be utilized to approximate the action-value function in general reinforcement learning problems. The proposed HR method ensures the provable convergence of the approximation algorithm, which makes the much-needed connection between regularization and explainable learning using neural networks. We provide lower and upper bounds for the error of the proposed HR solution, which helps build a reliable model. We also find that regularization with the proposed HR can be regarded as a contraction. We prove that the generalizability of neural networks can be maximized with a proper regularization matrix, and the proposed HR is applicable for neural networks with any mapping matrix. With the theoretical explanation of the extreme learning machine for neural network training and the proposed high-order regularization, one can better interpret the output of the neural network, thus leading to explainable learning. We present a case study based on regularized extreme learning neural networks to demonstrate the application of the proposed HR and give the corresponding incremental HR solution. We verify the performance of the proposed HR method by solving a classic control problem in reinforcement learning. The result demonstrates the superior performance of the method with significant enhancement in the generalizability of the neural network. Regularization in machine learning is often used to improve the generalizability of a neural network model; a regularization method typically imposes penalties on some properties of the model to avoid overfitting the training data and allow for better generalization to the unseen test data [1]-[3]. The penalty terms can be designed to reduce the complexity of a given model, and the accordingly obtained regularized model can have similar or even better performance [4].


Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

arXiv.org Artificial Intelligence

Large language models (LLMs) exhibit remarkable capabilities in natural language processing but face catastrophic forgetting when learning new tasks, where adaptation to a new domain leads to a substantial decline in performance on previous tasks. In this paper, we propose Controlled LoRA (CLoRA), a subspace regularization method on LoRA structure. Aiming to reduce the scale of output change while introduce minimal constraint on model capacity, CLoRA imposes constraint on the direction of updating matrix null space. Experimental results on commonly used LLM finetuning tasks reveal that CLoRA significantly outperforms existing LoRA subsequent methods on both in-domain and outdomain evaluations, highlighting the superority of CLoRA as a effective parameter-efficient finetuning method with catastrophic forgetting mitigating. Further investigation for model parameters indicates that CLoRA effectively balances the trade-off between model capacity and degree of forgetting.


High-order regularization dealing with ill-conditioned robot localization problems

arXiv.org Artificial Intelligence

In this work, we propose a high-order regularization method to solve the ill-conditioned problems in robot localization. Numerical solutions to robot localization problems are often unstable when the problems are ill-conditioned. A typical way to solve ill-conditioned problems is regularization, and a classical regularization method is the Tikhonov regularization. It is shown that the Tikhonov regularization can be seen as a low-order case of our method. We find that the proposed method is superior to the Tikhonov regularization in approximating some ill-conditioned inverse problems, such as robot localization problems. The proposed method overcomes the over-smoothing problem in the Tikhonov regularization as it can use more than one term in the approximation of the matrix inverse, and an explanation for the over-smoothing of the Tikhonov regularization is given. Moreover, one a priori criterion which improves the numerical stability of the ill-conditioned problem is proposed to obtain an optimal regularization matrix. As most of the regularization solutions are biased, we also provide two bias-correction techniques for the proposed high-order regularization. The simulation and experiment results using a sensor network in a 3D environment are discussed, demonstrating the performance of the proposed method.


Stabilizing Machine Learning Prediction of Dynamics: Noise and Noise-inspired Regularization

arXiv.org Artificial Intelligence

Recent work has shown that machine learning (ML) models can be trained to accurately forecast the dynamics of unknown chaotic dynamical systems. Short-term predictions of the state evolution and long-term predictions of the statistical patterns of the dynamics (``climate'') can be produced by employing a feedback loop, whereby the model is trained to predict forward one time step, then the model output is used as input for multiple time steps. In the absence of mitigating techniques, however, this technique can result in artificially rapid error growth. In this article, we systematically examine the technique of adding noise to the ML model input during training to promote stability and improve prediction accuracy. Furthermore, we introduce Linearized Multi-Noise Training (LMNT), a regularization technique that deterministically approximates the effect of many small, independent noise realizations added to the model input during training. Our case study uses reservoir computing, a machine-learning method using recurrent neural networks, to predict the spatiotemporal chaotic Kuramoto-Sivashinsky equation. We find that reservoir computers trained with noise or with LMNT produce climate predictions that appear to be indefinitely stable and have a climate very similar to the true system, while reservoir computers trained without regularization are unstable. Compared with other regularization techniques that yield stability in some cases, we find that both short-term and climate predictions from reservoir computers trained with noise or with LMNT are substantially more accurate. Finally, we show that the deterministic aspect of our LMNT regularization facilitates fast hyperparameter tuning when compared to training with noise.


Interpretable, similarity-driven multi-view embeddings from high-dimensional biomedical data

arXiv.org Machine Learning

Inter-modality covariation leveraged as a scientific principle can inform the development of novel hypotheses and increase statistical power in the analysis of diverse data. We present similarity-driven multi-view linear reconstruction (SiMLR), an algorithm that exploits inter-modality relationships to transform large scientific datasets into smaller, more well-powered and intepretable low-dimensional spaces. Novel aspects of this methodology include its objective function for identifying joint signal, an efficient approach based on sparse matrices for representing prior within-modality relationships and an efficient implementation that allows SiMLR to be applied to relatively large datasets with multiple modalities, each of which may have millions of entries. We first describe and contextualize SiMLR theory and implementation strategies. We then illustrate the method in simulated data to establish its expected performance. Subsequently, we demonstrate succinct SiMLR case studies, and compare with related methods, in publicly accessible example datasets. Lastly, we use SiMLR to derive a neurobiological embedding from three types of measurements - two measurements from structural neuroimaging complemented by single nucleotide polymorphisms (SNPs) from 44 depression and anxiety-related loci. We find that, in a validation dataset, the low-dimensional space from the training set exhibits above-chance relationships with clinical measurements of anxiety and, to a lesser degree, depression. The results suggest that SiMLR is able to derive a low-dimensional representation space that, in suitable datasets, may be clinically relevant. Taken together, this collection of results shows that SiMLR may be applied with default parameters to joint signal estimation from disparate modalities and may yield practically useful results.


Data-Driven Impulse Response Regularization via Deep Learning

arXiv.org Machine Learning

Impulse response estimation has for a long time been at the core of system identification. Up until some five to seven years ago, the generally held belief in the field was indeed that we knew all there was to know about this topic. However, the enlightening work by Pillonetto and De Nicolao [2010] changed this by showing that the estimate can in fact be improved significantly by assuming a Gaussian Process (GP) prior over the impulse response, which acts as a regularizer. This model-driven approach has since then been further refined [Pillonetto et al., 2011, Chen et al., 2012, Pillonetto et al., 2014], where the prior in this case could be interpreted to encode not only smoothness information, but also information about the exponential decay of the impulse response. In this paper we employ deep leaning (DL) to find a suitable regularizer via a method that is driven by data. Deep learning is a fairly new area of research that continues the work on neural networks from the 1990's. To get a brief, but informative, overview of the field of deep learning we recommend the paper by LeCun et al. [2015] and for a more complete snapshot of the field we refer to the monograph by Goodfel-low et al. [2016]. Deep learning has recently revolutionized several fields, including image recognition (e.g.


A Unified Approach to Adaptive Regularization in Online and Stochastic Optimization

arXiv.org Machine Learning

We describe a framework for deriving and analyzing online optimization algorithms that incorporate adaptive, data-dependent regularization, also termed preconditioning. Such algorithms have been proven useful in stochastic optimization by reshaping the gradients according to the geometry of the data. Our framework captures and unifies much of the existing literature on adaptive online methods, including the AdaGrad and Online Newton Step algorithms as well as their diagonal versions. As a result, we obtain new convergence proofs for these algorithms that are substantially simpler than previous analyses. Our framework also exposes the rationale for the different preconditioned updates used in common stochastic optimization methods.